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0^ ■ Abstract. Identification of local structure in intensive data - such as time series, im- 

w^ I ages, and higher dimensional processes - is an important problem in astronomy. Since 

the data are typically generated by an inhomogeneous Poisson process, an appropriate 
model is one that partitions the data space into cells, each of which is described by a 
homogeneous (constant event rate) Poisson process. It is key that the sizes and loca- 
tions of the cells are determined by the data, and are not predefined or even constrained 
^.,^' to be evenly spaced. For one-dimensional time series, the method amounts to Bayesian 

r^ I changepoint detection. Three approaches to solving the multiple changepoint problem 

Oh' are sketched, based on: (1) divide and conquer with single changepoints, (2) maximum 

posterior for the number of changepoints, and (3) cell coalescence. The last method 
, - starts from the Voronoi tessellation of the data, and thus should easily generalize to 

rS ' spaces of higher dimension. 
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I INTRODUCTION: THE DATA ANALYSIS 

PROBLEM 

Developments in detector technology for high energy astrophysics^ have led to 
observation systems capable of reporting accurate arrival times for individual pho- 
tons. These times, while not binned, are quantized in microsecond-scale units I like 
to call "ticks" - since they are in fact generated by the ticking of the computer clock 
on-board the spacecraft. In the approximation where the ticks are short compared 
to time scales of interest, we very accurately have a Poisson process. Note that, 
depending on the nature of the variability of the process, different mathematical 
models apply, as indicated in the table below. 





Nature of 
Variability 


Mathematical 
Model Process 


Constant 

Deterministic 

Random 


Homogeneous Poisson 

Inhomogeneous Poisson 

Doubly Stochastic Poisson 
(Cox Process) 





A number of mathematical references [20,36,1,2,40] describe the nature of spa- 
tial Poisson processes. Time series are the ID special case, consisting of streams of 
numbers representing photon detection times. In this context, the Poisson distribu- 
tion is so accurate that it is hardly an approximation. Statisticians seem to have a 
hard time understanding this point. I believe the culprit is the fact that the Poisson 
process is usually taught as the limit of a binomial process or the like. Technically, 
our data do comprise a finite Bernoulli lattice [40], since typically more than one 
event cannot be recorded at a given tick. (See [8] for an interesting discussion of a 
lattice theory of quantum fields.) But the rates in units of events^ per tick are so 



'^' The term high energy astrophysics is used loosely for both the study of astronomical objects 
which produce and emit large amounts of energy, and for observations of radiation consisting of 
high-energy photons, e.g. X-rays and gamma-rays. Often the two meanings coincide. 
^' We use the term event for photon detections, or other data points. 
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low that the probabihty of such multiple events is very low. 

In practice, the most significant imperfection is a departure, not from the as- 
sumed distribution, but from the assumption of independence of the process at 
different times. Photon detectors always have a finite dead time - an interval after 
the detection of one event during which another photon cannot be recorded, either 
because the detector mechanism itself is temporarily paralyzed, or the data system 
is too busy processing the event. This yields interdependences in the detection of 
photons close together in time. 

The discrete nature of photon counting data is most often considered a challenge. 
Many analysts first bin their data and then apply standard methods. Further, 
the notion is rife that not only are bins necessary, but that they must be large 
and contain enough events to produce a "statistically significant sample." This is 
wrong and wasteful. Analysis can be carried out directly on the event data. It is 
my belief that the discreteness and utter simplicity of the fundamental event - a 
photon was detected or it wasn't - are a big advantage. This simplicity and the 
fact that the observational errors accurately follow a distribution with essentially 
no free parameters*^ mean that the posterior marginalization can be carried out 
exactly, at least for some of the nuisance parameters [see Eqs. (4) and 5]. 

Happily, the simplicity of the basic events allows almost immediate generalization 
of one-dimensional results to data spaces of higher dimension. Simple examples are 
cases where the photons, in addition to being timed, are also tagged with spectral 
and spatial information - i. e. energies of photons and their location on the sky. 
Processes that sample the density of events in spaces of various dimension (denoted 
D) include: 

• time series {D = 1) 

• other sequential data {D = 1; e.g., genetic sequences) 

• images {D = 2) 

• time sequences of images {D = 3) 

• time sequences of spectra {D = 2) 

• points in parameter spaces (any D) 

The corresponding data types have in common that they comprise sets of points in 
some well defined space. Since the underlying process defines the intensity of some 
physical quantity over the space, we call these intensive data. The actual data 
can consist of a list of points, or the number of points in pre-selected bins {e.g. 
image pixels), or other data modes. A problem of broad interest is the detection 
of local^ structures in such data. Pulses, or other time-domain structures in time 



^) The Poisson rate parameter is pretty closely nailed at the local event rate. 

^' I use the term local to distinguish features of limited extent from global features, such as 

periodic signals extending over the whole of the data space. 



series data, features in images and clusters in parameter spaces of various dimension 
(classification) are examples. 

For one-dimensional time series, the method of Bayesian Blocks approximates the 
signal as a piecewise constant Poisson process [35] . Generalizing this representation 
to a space, denoted 5", of arbitrary dimension, the basic problem can be phrased 
as the following relatively straightforward Bayesian Maximum a posteriori (MAP) 
problem: 



Consider all partitions of S into subvolumes, or cells. 
Model the event rate within each cell as a homogeneous Poisson 

Process. 
Among all such partitioned models, which is the most likely? 



We determine the parameters in order of how fundamental they are to the nature 
of the model. The piecewise constant Poisson nature of the model is assumed, so 
the most fundamental parameter is N^p, the number of changepoints. This is deter- 
mined by marginalizing all of the other parameters - the locations of the change- 
points and the Poisson rates between them. Then the locations of the changepoints 
are determined. Finally, the block event rates are determined; trivially, the MAP 
value is just the ratio of A^, the number of events in the cell, to V , its volume. 

The resulting evidence in favor of this model for a single cell, called the marginal 
likelihood., and defined below in Eq. (4), is a simple function of V and A^ [below, 
Eqs. (4) and (5), and also see [35]]. For binned data and other data modes, and for 
other distributions than the Poisson one assumed here, the likelihood is similarly an 
explicit function of the same two parameters. We assume the cells are independent, 
so the total likelihood is the product of the likelihoods of the cells (see Section 111). 



II EXACT BAYES FACTORS 

For the multiple changepoint problem we need to evaluate the posterior proba- 
bility for piecewise constant models, given the data. An important simplification is 
that the marginal likelihoods and posterior probabilities for such models factor into 
the product of the same quantities for each independent segment of the model. We 
refer to these segments as blocks in ID contexts, and as cells in higher dimensions. 
This section gives the computation of the posterior for a single cell, which can then 
be used in various ways - such as the evaluation of Bayes factors for comparison of 
two or more models. The general form of Bayes factor comparing two models Mi 
and M2, given data D {c.f. [15], eq. (6.4)} is: 

R f * IM wl PJOm) Ip{\2\M,}p(D\^2,M,_}d\, 

Bayes factor(Af.; Af.) = -^^^^ = Jp(Ai|Af.)p(fl|A., A/O^'A. <'> 
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where p(A|M) is the prior on A andp(i5|A, M) is the hkehhood. Given the discussion 
above, it is easy to compute the appropriate factor for a Poisson model with rate 
parameter A (units: events per unit volume) for a set of events in some block or cell. 
The Poisson Likelihood, obtained by multiplying likelihoods for individual sample 
bins, is 

L(iV|A,y) = e^^^A^ (2) 

The usual Poisson factorial does not appear because the number of counts in a tick 
is or 1. Marginalizing A using the conjugate Poisson prior ( [15]; Section 2.7, p. 
49-50) 

P(A) = T^^e-^^A^-i , (3) 

r(a) 

the contribution to the Bayes factor for a cell - often called the marginal likelihood 
- is 

P(M|Z,) = fMA)L(A'|A,V'MA=j^^I(^^ (4) 

Based on a prior assigning a uniform distribution to the probability of occurrence 
of a single event, the marginal likelihood 

T(N + 1)T(V - N + I) 
P(M\D) = ^ ^ :^ ^v^ij 

was derived in [35]. A function of the same two sufficient statistics, A^ and V, it 
behaves similarly to the marginal likelihood in Eq. (4), but was found less accurate 
in simulation studies such as the one discussed below. 



Ill THREE APPROACHES TO THE MULTIPLE 
CHANGEPOINT PROBLEM 

We now have to evaluate the above marginal likelihood for each segment of a 
piecewise homogeneous Poisson model consisting of successive, independent^ blocks 
of data, and cobble the results together. The next section describes three ways to 
do this. The first two seem effective in ID, but have problematical extensions to 
higher dimensions. The third was inspired by its simplicity in 2D and 3D. All 
three methods are demonstrated on the same synthetic ID data. 

A An Iterative Approach: Top Down 



^' That is, due to a fundamental property of the Poisson distribution, random variables corre- 
sponding to counts in successive blocks (or different cells in general) are independent. This fact 
should not be confused with independence of the actual Poisson rates, which in general does not 
hold. 



As detailed in [35], the formulas above allow easy computation of the Bayes 
factor comparing (1) the two-block model in which the interval is segmented into 
two parts at a changepoint, with (2) a single Poisson model for the whole interval. 
The decision whether to keep an interval unsegmented or to divide it into two 
subintervals is based on comparison of the corresponding marginal likelihoods. Let 
^{N, V) stand for the marginal likelihood corresponding to a Poisson model for a 
volume of size V containing A^ events, such as one of the two functions given above. 
Then an interval should be broken in two if 

$(iVi, Vi)^N,+i, y,+i) - $(Ar, V)>0 , (6) 

where the putative changepoint divides the interval of size V into two parts, of 
size Vi and Vi+i = V = Vi, containing Ni and Aj+i = N — Ni events, respectively. 
This criterion is easily computed as a function of the location of the changepoint. 
The interval is then segmented at the point that maximizes the expression in Eq. 
(6). This divide and conquer scherae is applied first to the whole data interval, and 
then iteratively to any subintervals found at the previous step. When this criterion 
favors segmentation of no further intervals, computation halts. 

Figure 1 shows the results for synthetic data generated by a piecewise constant 
Poisson process, between eleven known changepoints. I used a modified form of the 
Blocks function popularized by David Donoho, Iain Johnstone, and the WaveLab 
project [3] as a sample function with many discontinuities that can be detected 
using wavelet methods (see Fig. 1 in [27]). The original function has blocks of 
negative amplitude, which will not do as Poisson rates. Hence I added a constant, 
3|, to the classic Blocks function, and then generated points obeying the Poisson 
distribution. 

The three panels show three divide and conquer iterations. It can be seen that 
the various changepoint locations are found rather accurately, allbeit with the single 
changepoint algorithm. In the early steps the posterior has multiple peaks at the 
various changepoints, but only the highest one is used at each step. There is 
a tendency for changepoints determined early in the process to be less accurate. 
These errors are not corrected as the iteration proceeds, but the algorithm could 
be modified to do so. 



B Maximum Posterior for Ncp via MCMC 

It is clear that the above method is not rigorous, in that it does not solve for 
all of the changepoints simultaneously. It is relatively straightforward, however, to 
remedy this with a more rigorous, but also more computationally intensive scheme. 
If there are multiple changepoints, say Ncp in number, the full posterior is just the 
product of the posteriors of all the subintervals. The marginalization of all the 
locations of the changepoints requires evaluating the A'^cp-dimensional integral^ of 
the posterior over all values of the changepoint locations. 



^) In practice, this is a finite sum, e.g. over a bin index or an event index. 
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Results obtained in this way, using a simple Markov Chain Monte Carlo (MCMC) 
method, are surprisingly good. Figure 2 shows the block representation of the same 
data as in Figure 1, obtained from a simple MCMC evaluation of the marginal 
likelihood as a function of the number of changepoints. The maximum posterior 
was at 16 changepoints (17 blocks), compared to the correct value of 11. The 
"extra" blocks are narrow, and while they do not look pretty, they will not much 
affect derived quantities (such as pulse widths, rise times, etc). 

The good performance with only a few iterations may be due to a combination 
of several factors: 

• the well behaved nature of the data 

o low dynamic range of the signal 

o well behaved backgrounds 

o exact Poisson statistics for the observational uncertainties 

• a simple, exact Likelihood; only one parameter 

• the fact that the posterior does not have to be computed accurately to distin- 
guish one value of N^p from another 



• the useful characteristics of the final model are not that sensitive to Ncp 

Nevertheless, when the number of changepoints and the number of data points are 
large, the computation is quite long. Since there is little global communication, 
in the sense that the location of a change point in one part of the observation 
interval affects that of one elsewhere, breaking the data up into smaller intervals is 
an effective way to speed up the overall computation. 



C Cell Coalescence (Bottom Up) 



Based on preliminary tests, a new algorithm may be a significant improvement 
over either of the above approaches. It begins with a fine-grained segmentation, 
namely the Voronoi tessellation of the data points, and merges these many cells to 
form fewer, larger ones. The outline of the algorithm is simple: 
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Bayesian Cell Coalescence 


(0) 


Initial Se 


gmentation: Voronoi Tessellation of the Events 


(1) Compute 


Bayes Factor for Merging Each Pair of Adjacent Cells 


(2) Identify 


^argest Bayes Factor (at i) 


(3) 


If Largest Bayes Factor is < 1, HALT! 


(4) 


Otherwise Merge Pair of Cells with Largest Bayes Factor: 




• N,^ 


- N, + N,+, 




•V,^ 


V, + y.+i 




• Delete Cell i + 1 


(5) 


Goto 1 





While there is no rigorous justification for this procedure, one has the sense that 
it should come reasonably close to obtaining the optimal solution. At each step in 
the iteration the local event rate in a cell is ^, the number of events in the cell 
divided by its volume. The rate estimates for the initial cells, ■^, are reasonable 
estimates of the fine-grain, local event rate if these cells are assigned roughly their 
surrounding volume, extending approximately halfway to neighboring points. An 
obvious choice for the initial partition is thus the Voronoi tessellation [40,19] of the 
data points. The Voronoi cell for a data point consists of all the space closer to 
that point than to any other data point. 

The Voronoi tessellation of a set of points on an interval (ID) is trivial: it 
is simply the set of intervals spanned by pairs of midpoints between successive 
data points. Many algorithms exist for computing Voronoi tessellations in higher 
dimensions [21,13,32,11], allowing one to address problems such as cluster detection 
in parameter spaces, and identification of structures in images. Since the marginal 
posteriors discussed above are valid for piecewise constant Poisson data in any 
dimension, the methods demonstrated here for ID should apply in general. 

The decision whether to merge two cells or to halt. Step (3), is based on com- 
parison of the Bayes factors. Using the same notation as above, in Eq. (6), cells i 
and i + 1 are merged if 

$(iV, + iV,+i, V, + ^.+i) - ^{N,, \/.)$(A^.+i, \/.+i) > (7) 

and kept separate otherwise. When this criterion favors the merging of no further 
cells, computation halts. 



Figure 3 demonstrates the application of this algorithm to the same synthetic 
data as in the previous two examples. The top panel depicts the state part way 
into the iterations, but is not far from the initial Voronoi tessellation, one Voronoi 
cell for each data point. Successive panels show the evolution of the representation 
as the cells merge into fewer, larger ones. In the last panel, the halting criterion 
mentioned above has just terminated the iteration. It can be seen that most of the 
changepoints are accurately detected. Several though are missed. 

IV RELATED WORK 

It is well known that Bayesian methods are well-suited to finding changepoints 
{e.g. [30,34]; see also Appendix C of [16]). In [42] methods similar to those de- 
scribed here are used to find changepoints in binned data, to an accuracy better 
than the bin size. A number of recent publication are relevant to this problem 
[37,17,18,42,6,34,38,39,31,7] and [22-29]. More recently, Raftery and colleagues 
have developed similar methods, mainly for 2D problems. In [5] the Voronoi sites 
are fiducial markers for the cells, not the tessellation defined by the individual data 
points. Other approaches are described in [14,4,10,41,33]. Bayesian methods are 
also useful for segmentation of autoregressive models, with applications to speech 
processing [9]. 

After this article was completed, I became aware that NASA's Chandra X-ray 
Observatory, is using an unbinned source detection technique much like cell coales- 
cence, and starting from the same Voronoi tessellation [12]. Source cells are merged 
with a percolation process based on a sample estimate of the density distribution. 
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FIGURE 1. Block representations based on the divide and conquer algorithm. The piecewise 
homogeneous Poisson data were generated from a modification of Donoho's Blocks function. The 
actual changepoints used to generate the data are shown as vertical lines at the bottom, and the 
changepoints determined on the first three steps of the iteration are dashed lines. 
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FIGURE 2. Block representations based on the MCMC algorithm. The piecewise homogeneous 
Poisson data were generated from a modification of Donoho's Blocks function. The actual change- 
points used to generate the data are shown as vertical dashed lines. Spurious, narrow blocks have 
been emphasized (e.g. the one at about t = 0.76. 
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FIGURE 3. Block representations using the Cell Coalescence algorithm. The top panel shows 
the overly fine segmentation in the early stages of the iteration. In successive panels the process 
is converging toward a coarser representation. The bottom panel shows the first stage at which 
the Bayes factor contraindicates merging of all of the remaining blocks. The actual changcpoints 
used to generate the data are shown as vertical dotted lines. 
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